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Abstract 

In this paper, we will assess the performance of a 
data-driven anomaly detection algorithm, the In- 
ductive Monitoring System (IMS), which can be 
used to detect simulated Thrust Vector Control 
(TVC) system failures. However, the ability of 
IMS to detect these failures in a true operational 
setting may be related to the realistic nature of 
how they are simulated. As such, we will investi- 
gate both a low fidelity and high fidelity approach 
to simulating such failures, with the latter based 
upon the underlying physics. Furthermore, the 
ability of IMS to detect anomalies that were pre- 
viously unknown and not previously simulated 
will be studied in earnest, as well as apparent de- 
ficiencies or misapplications that result from us- 
ing the data-driven paradigm. Our conclusions 
indicate that robust detection performance of sim- 
ulated failures using IMS is not appreciably af- 
fected by the use of a high fidelity simulation. 
However, we have found that the inclusion of a 
data-driven algorithm such as IMS into a suite of 
deployable health management technologies does 
add significant value. 

1 INTRODUCTION 

In preparation for the launch of Ares I-X, a data-driven 
anomaly detection algorithm was deployed as part of a 
suite of several software tools for inclusion in a ground 
diagnostics prototype to support detection and diagno- 
sis of potential anomalies or failures during the pre- 
launch phase. The selected data-driven anomaly detec- 
tion algorithm, IMS (Inductive Monitoring System), is 
based on incremental clustering, and operates with a 
semi-supervised anomaly detection paradigm, as de- 
fined in previous work (Chandola el al., 2009). This 
implies complete reliance on training data of only the 
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nominal class. As such the training data is implic- 
itly labeled, and there are no labels for the anomalous 
class. The clustering is performed in an unsupervised 
manner, and any monitored data points falling outside 
of the clusters are flagged as anomalous. Detailed de- 
scriptions of how IMS performs anomaly detection are 
provided in previous work (Iverson, 2004), (Iverson el 
al., 2009), (Martin, 2010). 

Due to the lack of available nominal and fault data 
with which to validate and test the algorithm for Ares 
I-X, data from the Thrust Vector Control (TVC) Sys- 
tem from previous Space Shuttle missions was used. 
The data collected served two purposes: as nominal 
data, and fault data which was constructed by seed- 
ing nominal data with failures of various types, sever- 
ity, and fidelity for subsequent validation and testing. 
However, the ability of IMS to detect true failures may 
possibly be influenced by the realism of how they are 
simulated and subsequently tested. As such, a signif- 
icant portion of this paper will be dedicated to inves- 
tigating a computationally efficient approach to sim- 
ulating such failures, and observing the effect of the 
increased fidelity on detection performance, extending 
what was presented in previous work (Martin et al., 
2010 ). 

IMS was one of several data-driven anomaly de- 
tection tools that were evaluated for inclusion as 
part of the suite of technologies to be demonstrated 
during the Ares I-X test launch, which included 
both model-based and rule-based technologies. Data- 
driven algorithms are just one of three different types 
of algorithms that were deployed, the details of 
which were presented in previous work (Iverson et 
al., 2009), (Schwabacher and Waterman, 2008) and 
(Schwabacher et al., 2010a). The other two types 
of algorithms that were deployed include a “rule- 
based” expert system, and a “model-based” system. 
Within these two categories, the deployable candi- 
dates were selected based upon their flight heritage 
and system certifiability. For the rule-based sys- 
tem, SHINE (Spacecraft Health Inference Engine) 
(James and Atkinson, 1990) was selected for deploy- 
ment, which is used within two components of BEAM 
(Beacon-based Exception Analysis for Multimissions) 



(Mackey et al . , 2001). Other components of BEAM 
include various data-driven algorithms. BEAM is a 
patented technology developed at NASA’s JPL (Jet 
Propulsion Laboratory). SHINE serves to aid in the 
management and identification of operational modes. 
For the “model-based” system, a commercially avail- 
able package developed by QSI (Qualtech Systems, 
Inc.), TEAMS (Testability Engineering and Mainte- 
nance System) was highlighted in work subsequent 
to its debut (Cavanaugh, 2001), and was selected for 
deployment to aid in diagnosis. In the context of 
this particular deployment, distinctions among the use 
of the terms “data-driven,” “rule-based,” and “model- 
based,” can be found in the previously cited paper 
(Schwabacher and Waterman, 2008). In the final de- 
ployed software package, we integrated TEAMS with 
IMS in the Ares I-X Ground Diagnostics Prototype 
(GDP) by running the two in parallel and displaying 
the outputs of both tools on the same console, and we 
used SHINE to provide the inputs to TEAMS. 

In this effort, it is of great importance to provide for 
a robust and accurate detection of a variety of known 
fault modes that span a number of different rates of 
progression and severity. However, this capability is 
already well-provided for by other model-based tools 
(i.e. TEAMS) within the suite of deployed tools. IMS 
should be able to detect these, as well as unknown 
faults or anomalies that otherwise may have not been 
modeled from a top-down, or data-driven perspective, 
rather than a bottom-up, or model-based perspective. 
A review of the resulting performance of the entire de- 
ployed package has also been provided (Schwabacher 
et al., 2010a), (Schwabacher et al., 2010b). Other re- 
lated work covering similar topics is also available in 
the literature (Iverson et al., 2009), (Park et al., 2002), 
(Pisanich et al., 2006), (Rao et al., 2009). 

Some advantages that IMS has over the model- 
based and rule-based algorithms include the fact that: 

1. It has the ability to detect anomalies that were 
previously unknown and not previously simulated 
or accounted for. 

2. It has the potential to detect anomalies that are 
precursors of faults before a model-based system 
detects the fault. 

3. It does not require a labor-intensive modeling 
process. 

The disadvantages of IMS compared with model- 
based tools are: 

1. It does anomaly detection only, not diagnosis, so 
that additional analysis is necessary to determine 
whether a detected anomaly is significant or not. 

2. It only provides an acceptable level of accuracy if 
it is trained using a sufficient quantity of historical 
and/or simulated training data. 

In previous work (Martin, 2010), we studied three 
candidates to provide the primary role of data-driven 
anomaly detection, which included IMS. Of the three 
algorithms tested, it was found that IMS was the best 
performing algorithm when considering both over- 
all accuracy as quantified by the area under the Re- 
ceiver Operating Characteristic (ROC) curve (AUC), 
and computational complexity. In this paper we aim to 


follow up with more detail on the performance of IMS 
in its designated primary roles as specified above, ex- 
ploring both its advantages and disadvantages. In do- 
ing so, we will demonstrate that the other model-based 
and rule-based technologies with which IMS was de- 
ployed provided certain capabilities which IMS com- 
plemented well in some cases, while in other cases, 
the performance of IMS was less than desirable due to 
inappropriate use. 

The remainder of this paper will be organized as fol- 
lows: Section 2 will provide a detailed description of 
all simulated failures to be tested, including the higher 
fidelity version based upon physics. Section 3 provides 
a comparative discussion of the performance of IMS 
as it relates to the ability to robustly detecting simu- 
lated failures of varying fidelities. Section 4 will pro- 
vide a general discussion of the selection of IMS as the 
data-driven anomaly detection algorithm, selection of 
parameters, training, validation & testing procedures. 
Both quantitative and qualitative performance results 
for Shuttle and Ares I-X data at the pad and at the Ve- 
hicle Assembly Building (VAB) will also be discussed. 
Section 5 provides a comparative discussion of the per- 
formance of IMS, and the model-based detection and 
diagnostic tool, TEAMS. The final concluding section 
will provide an overall summary and epilogue. 

2 SIMULATED FAILURES 

Historical Space Shuttle data was used to test the en- 
tire Ares I-X ground diagnostic prototype. The Space 
Shuttle Solid Rocket Booster (SRB) TVC is virtually 
identical to the Ares I-X first-stage TVC, so the SRB 
TVC data was expected to be very similar to the Ares 
I-X TVC data. Similarly, the ground hydraulic sys- 
tem used with the SRB TVC is virtually identical to 
the ground hydraulic system used with the Ares I-X 
TVC. These assumptions held up modestly well after 
our post-flight analysis, in consideration of all the tools 
that were deployed to support failure and anomaly de- 
tection. The differences that we found in the data were 
caused by differences in operations between Shuttle 
and Ares I-X, rather than by differences in the TVC 
or HSS hardware. 

The SRB TVC and the associated ground hydraulic 
system have had very few failures. We thus had avail- 
able to us an abundance of nominal data, but very little 
failure data. We therefore decided to develop a set of 
failure simulations that could be used to test the ability 
of the prototype to detect and diagnose failures. We 
inserted simulated failures into the historical Shuttle 
data, and used the resulting data sets to test the proto- 
type before the Ares I-X launch. 

Table 1 provides a summary of the failure modes 
that we simulated for each vehicle location. In order 
to test the integration of the TVC and HSS TEAMS 
models, we decided to select one failure mode that 
can be isolated to the TVC (Failure Mode la, FSM 
Leak) 1 , one that can be isolated to the HSS (Failure 


'The Fuel Supply Module (FSM) leak is a N 2 H 4 (hy- 
drazine) leak resulting in a pressure drop, and is simulated 
within 1 min prior to launch at the pad and within the 34 
minute period after the calibration test in the VAB. 



Table 1 : Failure Mode Summary 


Failure Mode Label 

Vehicle Location 

Failure Mode 

la 

Pad 

VAB 

FSM Leak 

2 

VAB 

HPU overheat 

3 

Pad 

VAB 

Hydraulic Leak 

4 

VAB 

Stuck actuator 


Mode 2, HPU overheat) 2 , and one that would produce 
a TEAMS ambiguity group including both TVC and 
HSS candidates (Failure Mode 3, Hydraulic Leak). 3 
In addition, because the actuator positioning test was 
considered to be the most important pre-launch test of 
the TVC, we decided to simulate a failure during this 
test (Failure Mode 4, Stuck actuator). 4 We will only 
describe failure mode la in the remainder of this sec- 
tion, as it includes examples of simulations that span 
the range of fidelity used for all of the failure modes. 
For the remaining failure modes, low fidelity linear 
simulations were used and simulated in a similar fash- 
ion as the low fidelity version of failure mode la. Fur- 
thermore, although the motivation for selecting these 
specific failure modes were based upon support for 
testing and integration of TEAMS models, they also 
serve as proving grounds for testing the anomaly de- 
tection capability of IMS. 

As shown in Table 1, a leak in the fuel supply mod- 
ule can be simulated either at the pad or at the VAB. 
The leak at the pad was simulated to occur between 
Go for GLS Start (at approximately T-3 1 sec) and Go 
for SSME Start (at approximately T-10 sec). The FSM 
pressure is simulated to drop to an off-nominal value 
instead of nominally staying above a specified thresh- 
old. 

Similar to the other simulated failure scenarios, an 
initial attempt at the construction of the FSM failure 
simulation involved the simple use of a linearly de- 
creasing ramp, given a predefined rate of degradation 
from the nominal operating pressure to an off-nominal 
value. This linear simulation was used to support the 
ROC analysis performed in a previous study (Martin, 
2010). However, it is possible to use a higher fidelity 
physics-based simulation for this scenario because all 
of the relevant data is available for its construction. A 
higher fidelity failure scenario may provide a more re- 
alistic test of our algorithm’s ability to detect the fail- 
ure in reality. The method used for the same simu- 
lated failure occurring at the VAB spans the period of 
time during which APU (Auxiliary Power Unit) sys- 
tem checks are conducted. Both low fidelity (linearly 

2 The Hydraulic Pumping Unit (HPU) overheat failure is 
an over-temperature failure simulated within a 25 min period 
during tests in the VAB. 

3 A hydraulic fluid leak will result in a hydraulic fluid 
reservoir level drop that is simulated within 1 min prior to 
launch at the pad and within the 10 minute period after the 
calibration test in VAB. 

4 The actuator is simulated to be stuck during the actuator 
positioning test during a 2.5 min test in VAB. 


decreasing ramps) and high fidelity (physics-based) 
failure simulations for the FSM leak will be used for 
analysis of data at both the pad and the VAB to offer 
a fair basis for comparison in how fidelity affects fi- 
nal performance. This is primarily due to the fact that 
differences in detection performance between the VAB 
and pad may be due to differences in operational pro- 
cedures regardless of simulation fidelity. 

The FSM pressure will begin dropping from a nom- 
inal value to venting at atmospheric pressure over the 
course of a few minutes. As the FSM pressure drops, 
the FSM pressure sensor will redline on a low value. 
To simulate this failure, we must account for both fluid 
phases contained in the FSM, the liquid hydrazine and 
the gaseous nitrogen used to pressurize the spherical 
tank, such that it is completely voided. The leak in 
the FSM will be simulated to evolve according to the 
following assumptions: 

1. Assume that the geometry of the FSM is estab- 
lished according to available documentation. 

2. Assume that the liquid hydrazine ( N 2 H 4 ) is filled 
only to midpoint of the spherical tank. 

3. Assume that the leak is below the surface of the 
liquid. 

In order to simulate the FSM leak according to 
physics, we will also implicitly use all of the as- 
sumptions that result from applying the unsteady form 
of Bernoulli’s equation as presented in (Munson el 
al., 1998) to solve the differential equation shown as 
Eqn. 1 associated with the initial leak of the liquid hy- 
drazine. Fig. 1 depicts the leak along with some of the 
geometrical constants and subscripted reference points 
used in Eqn. 1 . 




(i) 


Pg=Po ~ Pa is the gage pressure in the tank, where po 
is the pressure to which the tank is pressurized with 
G(V 2 , and p a is atmospheric pressure, p is are the den- 
sity of liquid hydrazine, and g represents the gravita- 
tional constant. Cd is the coefficient of discharge at 
the leak point, and s defines the fluid streamline along 
which Bernoulli’s equation is being applied, v\ and 
h 1 define the velocity and height from the ground to 
the top of the liquid hydrazine, respectively. Similarly, 
V 2 and h '2 define the exit velocity and height from the 
ground to the site of the leak, respectively. 

We assume the sphere has radius r, and the cross- 
sectional disk representing the top surface of the liquid 
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hydrazine shown in Fig. 1 has a radius of d. Since we 
are interested in unanticipated decreases in the height 

of the liquid hydrazine in the tank, let us define h=h± 
as our independent variable to simplify Eqn. 1 for the 
one-dimensional case, defined with respect to the ref- 
erence +z shown in Fig. 1. Furthermore, we may ap- 
ply Eqn. 2 for conservation of mass, and Eqn. 3 defines 
the velocity iq as a function of the height h \ . The ideal 
gas laws Eqns. 11-12 are defined for constant tempera- 
ture (de)pressurizaton, and we assume constant accel- 
eration via Eqn. 4. The geometry defined in Fig. 1 and 
auxiliary Eqns. 5-10 involve h g , the distance from the 
ground to the bottom of the tank, and h r , the distance 
from the top surface of the liquid hydrazine in the tank 
to the top of the tank. Thus, the simplified version 
of Eqn. 1 results in the differential equation shown as 
Eqn. 13. 


(h 2 



h) 


7TI = 

pAivi = pA 2 v 2 

(2) 

A 

Vi = 

dh\ dh 
dt dt 

(3) 

dV , 
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(4) 

Ai(h) = 

7T d{h) 2 

(5) 

a 2 = 

nd 2 0 

(6) 

Vg(h) = 

- h r ) 

(7) 

Vo = 

27 r , 
— r 6 
3 

(8) 

d(h) = 

\J h r (2r — h r ) 

(9) 

h r = 

2 r + h g — h 

(10) 

PoVo = 

m g RT 

(11) 

P(t ) = 

PoVo 

v g 

(12) 

( d 2 h \ 

UH 

1 ( PoVo > 

p\V g (h) Pa J 

1 + 


/ dh\ 2 

[ (C d A 2 ) 2 - A\(h)~\ 

V dt ) 

(' C d A 2 ) 2 


where do represents the radius of the leak area as- 
sumed to be a round hole, and R represents the ideal 
gas constant for GN 2 . p(t) and T represent the abso- 
lute pressure as the leak evolves as a function of time, 
and absolute temperature of the GN 2 , respectively. A j 
and A 2 represent the surface areas of the N 2 HJGN 2 
fluid interface and the round hole through which liq- 
uid hydrazine is leaking, respectively. Vo and V g are 
the initial volume of GN 2 and the volume of GN 2 as 
the leak evolves, respectively. Finally, to represents 
the mass flow rate of the liquid hydrazine ( N 2 Ha ), and 
m g represents the total mass of the GN 2 in the tank. 

An approximation to the resulting differential equa- 
tion can be used to yield a separable nonlinear differen- 
tial equation that can be solved in closed form, shown 
as Eqn. 14. This approximation is applied by recog- 
nizing that the left hand side of Eqn. 13 (quantifying 
the gravitational and acceleration terms) is negligible 
relative to the right hand side. The gravitational term 
is always negligible, and the acceleration term is im- 
portant only for quantification of a negligibly small 
transient at the very beginning of the leak. Further- 
more, constants characterizing the FSM geometry can 
be simplified due to the relative sizes of the leak radius 
and the radius of the N 2 HJGN 2 fluid interface (i.e. 
do <C d[h)). The last assumption is that p a <C p{t), 
which may contribute most to the approximation error 
since the tank pressure evolves over time and will not 
necessarily always be much greater than atmospheric 
pressure. Thus the error may potentially grow over 
time as the tank pressure decreases due to evolution of 
the leak. However, in general the resulting closed-form 
representation will help to relieve the computational 
burden associated with numerical methods otherwise 
required to solve the differential equation (i.e. a stiff 
solver). 


dh CdA 2 I 2 p 0 Vo 

dt “ Ai(h) y pV g (h) ( J 

Note that the negative square root of (4|) 2 must 
be used in Eqn. 14 in order to yield a real solution. 
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Furthermore, by recognizing that ^ = —Ai(h), 
Eqn. 14 can be simplified to Eqn. 15. 


v hd_V 9 

9 dt 


— + « C d A 2 \ 


l2p 0 Vo 


(15) 


Integrating both sides of Eqn. 15 and combining the 
result with Eqn. 12, we may now write the resulting 
closed-form expression for the tank pressure as a func- 
tion of time, p{t), shown as Eqn. 16. 



Simulation of the voiding of the remaining gaseous 
nitrogen ( GN 2 ) in the FSM is performed by use of a 
linear I s * order approximation of a differential equa- 
tion governing the release of an ideal gas as used in 
(Tchouvelev et al., 2007). The solution of the dif- 
ferential equation is shown as Eqn. 17. The mass of 
the GN 2 was obtained by use of the design condition 
(1.1 lbs of gaseous nitrogen at 400 psig as a baseline), 
obtained from the seminal paper on the introduction 
of the FSM (McCool et al., 1980). It was also as- 
sumed that the GN 2 underwent a constant temperature 
and constant volume ideal depressurization (bleeding 
off tank pressure by operating a GN 2 pressurization 
valve) from this design condition to the nominal value 
that existed at the time the leak was simulated when 
in the VAB. The constant temperature assumption also 
holds for evolution of the leak from the nominal pres- 
sure value to p(t) = p a . 


P(t) 


p v e 



7+1 

7-1 RT 


(17) 


Of course, when h = h 2 , the liquid hydrazine will 
have emptied out to the point that it can no longer es- 
cape from the hole, and only the gaseous nitrogen is 
left to escape. We call the pressure at which this occurs 
the vent pressure, p v , which can easily be computed 
using Eqns. 7, 10, and 12. The time of this event can 
be approximated by using Eqn. 16. The corresponding 
volume of gas left to be evacuated from the tank is V v , 
and 7 is the ratio of specific heats for GN 2 . Therefore, 
Eqn. 16 governs the release of liquid hydrazine until 
the time of the vent pressure. At this point, Eqn. 17 
governs the subsequent release of gaseous nitrogen and 
complete voiding of the tank at which point p(t) = p a - 


3 COMPARATIVE ANALYSIS 

In this section we aim to investigate and observe the ef- 
fect of increased simulation fidelity on detection per- 
formance. In doing so we hope to gain a better un- 
derstanding for and develop an appreciation of possi- 
ble improved ability of IMS to detect simulated fail- 
ures that may be more realistic. In the previous sec- 
tion, we have provided the details for how a high fi- 
delity, physics-based simulation of a fuel supply mod- 
ule leak is to evolve, according to Eqns. 16 and 17. 
Using these equations, the time at which the pressure 
in the FSM approximately reaches atmospheric pres- 
sure associated with the high fidelity simulation can 


FSM void time as a function of leak radius 



Figure 2: Time to FSM Voiding for Various Leak Radii 


be used to construct the slope of the line associated 
with the low fidelity simulation, for a fixed leak ra- 
dius. Thus, implicit linearized versions of Eqns. 16 
and 17 represent low fidelity simulations. The slope 
of the resulting line will determine the rate of degra- 
dation, to be used as a fair basis for comparison to the 
nonlinear rate of degradation which evolves according 
to physics. Fig. 2 illustrates the times to be used to 
construct the slopes of low fidelity linear simulations, 
based upon various leak radii that were simulated with 
the high fidelity physics based simulations. 

The detection performance can be quantified by the 
Area under the ROC (Receiver Operating Characteris- 
tic) curve (AUC). The ROC curve is a plot of the true 
positive rate against the false positive rate, and can 
be used to help make the tradeoff between these two 
rates. The curve is constructed by treating time points 
as representative samples, all of which are implicitly 
used to compute the true and false positive rates. The 
AUC is loosely a measure of accuracy over all possible 
tradeoffs between the true positive rate and the false 
positive rate, computed by numerically integrating the 
area under the ROC curve. More formally, the AUC 
represents the probability that a randomly chosen fail- 
ure data point is more suspect than a randomly chosen 
nominal data point (Rosset, 2004). An AUC of one 
thus indicates perfect ranking of these two randomly 
selected data points. 

As such. Fig. 3 demonstrates how detection perfor- 
mance varies across a range of leak radii for both the 
high and low-fidelity simulations of FSM leaks, using 
Shuttle data at the pad as the sole exemplar. Detection 
performance using Shuttle data from the VAB is poorer 
than that at the pad due to reasons to be described in 
Sec. 4. These reasons are also specific to the FSM leak 
failure mode la, but otherwise performance using the 
VAB Shuttle data exhibit the same tendencies as per- 
formance based upon using data from the pad. 

Two main observations can be made regarding Fig. 
3. First, it is evident that robust detection performance 
improves as the leak radius increases, as quantified by 
the AUC, regardless of the simulation fidelity. This 
meets with intuition, since a faster leak should be more 
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Figure 3: High vs. Low Fidelity Simulation Detection 
Performance 


easily and quickly detectable. The second observa- 
tion relates to the fact that the detection performance 
of both the high and low fidelity simulations converge 
as the leak radius increases. This also meets with in- 
tuition, since a faster leak can be more easily approxi- 
mated in a linear fashion. However, as we can tell from 
the error bars, there is quite a bit of overlap between 
the high and low fidelity simulation methods for fast 
leaks, and even for slow leaks. Thus, there is no appre- 
ciable difference between the detection performance 
results for the low and high fidelity simulations, and as 
such we will not make the distinction between the two 
for the remainder of the paper. 

4 ANOMALY DETECTION 

As mentioned previously, IMS works under the prin- 
ciple of semi-supervised anomaly detection by build- 
ing a model of the nominal historical data on which 
it is trained. Because IMS only models the nominal 
data, and does not model any failure modes, it can 
potentially detect unknown failure modes. The model 
takes the form of a knowledge base (KB) of clusters. 
Once the KB has been learned, unseen data points are 
evaluated against the KB and assigned anomaly scores 
based on how anomalous the data points are with re- 
spect to the training data. If a new point falls within 
an existing cluster, then it is assigned an IMS score of 
zero. If it does not fall within an existing cluster, then 
the distance to the nearest cluster is used as the IMS 
score. When an anomalous period of the testing data 
is localized, the contributing IMS scores can be iden- 
tified, helping to diagnose the issue. Prior to the Ares 
I-X launch, we trained IMS on historical Space Shut- 
tle data, and tested it using historical Shuttle data into 
which we had inserted simulated failures. During the 
Ares I-X pre-launch period, IMS processed live Ares 
I-X data, using a knowledge base that was the result of 
training IMS on historical Shuttle data. The remainder 
of this section describes the selection of measurements 
for use with IMS, the training and testing procedures 


used, and the results obtained both on Shuttle data and 
on Ares I-X data. The section concludes with a sum- 
mary of the results. 

4.1 Parameter Selection 

For Ares I-X data to be compatible with historical 
Shuttle data a common set of parameters needed to be 
chosen. During the Shuttle analysis on chosen sim- 
ulated faults, all continuous-valued parameters were 
selected along with one discrete parameter that was 
known to be critical in detecting one of the three failure 
simulations, for a total of 137 parameters. The choice 
of mostly using continuous parameters was made be- 
cause historically IMS has performed better when op- 
erating on mostly continuous sensor values. After run- 
ning an analysis on the failure simulations, some false 
alarms were detected and an additional set of parame- 
ters were eliminated, leaving 102. For the purposes of 
feature selection (parameter elimination), a false alarm 
is defined qualitatively as a large excursion above an 
apparent “baseline” in the composite score produced 
by IMS, which characterizes the anomalousness of a 
specific point in the time series. With the elimination 
of these parameters the false alarms were significantly 
reduced. When the first set of Ares I-X VAB data was 
recorded a common subset was selected between the 
Ares I-X parameter set and the 102 parameters from 
the Shuttle resulting in the 33 parameters used for anal- 
ysis on the Ares I-X data. 

4.2 Training and Testing Procedures 

For the purpose of training and testing IMS, we used 
historical Space Shuttle data into which we inserted 
simulated failures, with varying rates of degradation, 
and spanning fixed time periods in a random fash- 
ion. Although the main purpose of using IMS in the 
Ground Diagnostics Prototype is to detect unknown 
failures, we tested it by using simulations of known 
failures. (For obvious reasons, we were unable to sim- 
ulate unknown failures.) IMS has a number of tunable 
input parameters, however one key parameter that was 
very important to tune was the maximum interpreta- 
tion (max interp) parameter. This parameter governs 
the threshold in the learning phase that determines if 
a new data point should be placed in the current clus- 
ter or used to generate a new cluster. The parameter 
directly influences the number of clusters created in 
the learning phase and therefore has a major influence 
in the final anomaly score calculated by IMS. As the 
max interp value increases the total number of clusters 
formed becomes smaller. 

To determine the optimal max interp value and cor- 
responding number of clusters a set of cross validation 
runs was performed on a set of Shuttle VAB and pad 
data, using the AUC as the governing metric for opti- 
mization. Cross validation is a technique for estimat- 
ing the accuracy of a machine learning algorithm, by 
training and testing the algorithm multiple times, each 
time using different subsets of the available data for 
training and testing, and then averaging the results. 

4.3 Results on Shuttle Simulations 

Once the cross validation runs were complete, the ar- 
eas under the ROC curves were calculated using data 
that spans the time that the shuttle was still in the VAB. 


0.9 

0.8 

0.7 

0.6 



— Mean High Fidelity 
■■‘Max/Min High Fidelity 
— Mean Low Fidelity 
~~~ Max/Min Low Fidelity 


id 


Max Interp vs AUG 



Max Interp Value 


Figure 4: AUC as a function of IMS Parameter Max 
Interp for Shuttle data from the VAB 
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Figure 5: ROC Curve for Optimal Max Interp for Shut- 
tle data from the VAB 


Figure 4 shows the maximum, minimum, and average 
AUC over the three-fold cross validations and three 
fault scenarios (listed as failure modes la, 2, and 3 in 
Table 1) for each max interp value. The optimal max 
interp value that was chosen is marked in the plots. 
The mean AUC with the highest value is 0.86893, and 
corresponds to the optimal max interp value of 0.13, 
which can be seen in Figure 4. The ROC curve as- 
sociated with this optimized max interp value can be 
seen in Figure 5. The relatively modest detection per- 
formance at the VAB can be attributed to the fact that 
IMS had difficulty detecting simulated failure la. This 
difficulty stemmed from the fact that the increase in 
IMS score resulting from this simulated failure was 
not much larger than the nominal variation in the IMS 
score, so it was not possible to select a threshold that 
would allow IMS to detect all of the simulated failures 
without increasing the number of false alarms. Thus, 
some failure modes are easily detected using IMS’ 
distance-based approach with clustering, while others 
are not. When IMS is used in parallel with TEAMS- 
RT, TEAMS-RT should detect all of the failures that 
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Figure 6: Ares I-X VAB Global IMS Score with False 
Alarms 


are modeled in the TEAMS model; the advantage of 
using IMS in addition is that it has the potential to de- 
tect failures that were not modeled, as well as anoma- 
lies that are not yet failures. For the pad, the AUC is 
0.99919, indicating that IMS does an excellent job of 
detecting the two simulated failure modes la and 3 at 
the pad, and performs much better than at the VAB. An 
intuitive explanation for this discrepancy relates to the 
fact that at the pad only a small portion of the data has 
high “activity,” during the last minute before launch. 
However, data from quiescent periods previous to the 
last minute before launch are also used for analysis. 
As such, this translates to a lower signal to noise ra- 
tio, which directly influences the AUC, resulting in a 
higher value and thus fewer false positives. 

4.4 Results on Ares I-X 

Once the optimal max interp parameter was deter- 
mined from the Shuttle data, IMS was trained on 33 
measurements using Shuttle data from seven flights, 
which also represents the greatest common subset cor- 
responding to equivalent Ares I-X measurements. Af- 
ter building the knowledge base, the Ares I-X data was 
evaluated against it, and ostensibly acts as hold out test 
data from a machine learning standpoint. The resulting 
IMS scores for the VAB are shown in Figure 6. With 
the initial set of 33 measurements, 3 periods of anoma- 
lous behavior were flagged by IMS; they are labeled 
as three “False Alarms” in Figure 6. We performed 
an analysis of each “false alarm”; here we present the 
analysis of False Alarm 1 as an example. We deter- 
mined that False Alarm 1 was primarily caused by two 
measurements. The contributing IMS scores for these 
two measurements are plotted in Figure 7. 

False Alarm 1 was caused by a difference between 
the Space Shuttle and Ares I-X data. In recent years, 
the TVC actuator tests performed in the VAB have all 
been “pinned” tests, meaning that the actuator is phys- 
ically pinned to the nozzle during testing, so that the 
nozzle moves during the test. The first TVC actuator 
position test performed in the VAB for Ares I-X was 
an “unpinned” test, meaning that the actuator was de- 
tached from the nozzle, and the nozzle did not move 
during the test. Because the actuator was unpinned, 
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Figure 7: VAB Top Contributing IMS Scores For False Alarm 1 


it was able to move through a larger range of motion 
that is not possible during pinned testing. IMS there- 
fore saw rock and tilt position values that it had never 
seen in the Shuttle data, which it flagged as anomalies. 
These anomalies are “false alarms” in the sense that 
they are not failures, but they do illustrate the ability 
of IMS to detect new data that is different from what it 
has seen before. We performed a similar analysis for 
the pad, where there were fewer anomalies identified 
by IMS. Like the anomalies detected at the VAB, the 
anomalies detected at the launch pad were caused by 
operational differences between Shuttle and Ares I-X. 

4.5 Summary of IMS Deployment Results 

The experiments that we ran before the Ares I-X 
launch using historical Space Shuttle data with sim- 
ulated failures demonstrated that IMS is able to de- 
tect most of the simulated failures, but not all of them. 
In particular, it had difficulty detecting the simulated 
failure mode la in the VAB due to its relatively small 
contribution to the overall IMS anomaly score com- 
pared to the other two simulated failure modes. IMS is 
not trained to detect specific failure modes; it detects 
data that is anomalous according to its cluster-based 
model. We expect that many known and unknown fail- 
ure modes will be detected as anomalies by IMS, but it 
is not guaranteed to detect all possible failure modes. 
The advantage of using IMS together with a model- 
based diagnosis system such as TEAMS is that it adds 
the potential to detect unknown failure modes and to 
detect precursors of failures. 

The results of running IMS on Ares I-X data, us- 
ing a knowledge base that was trained on historical 
Space Shuttle data confirm our hypothesis that the 
Ares I-X TVC data is reasonably similar to the Space 
Shuttle SRB TVC data. Most of the time IMS pro- 
duced small anomaly scores when run on the Ares I-X 
data. IMS did detect some “anomalies” in the Ares I- 
X data. These anomalies were “false alarms” in the 
sense that they were not failures but rather caused by 
operations performed differently for Ares I-X versus 
Shuttle; hence, they illustrate the ability of IMS to de- 
tect new data that is different from what has been seen 
in the past. 


5 IMS/TEAMS PERFORMANCE 
COMPARISON 

5.1 Anomalies Detected by IMS That Were Not 
Detected by TEAMS 

We have seen that IMS detected some interest- 
ing anomalies that were not detected by TEAMS 
because they were not failures as defined in the 
FMEA (Failure Modes and Effects Analysis) and the 
other documents on which the TEAMS models were 
based. One such anomaly was the pinned/unpinned 
actuator anomaly mentioned previously. In the 
pinned/unpinned anomaly there were procedural dif- 
ferences between the TVC test for the Shuttle and Ares 
I-X, resulting in IMS signaling an anomaly in the TVC 
rock and tilt actuator positions. . This anomaly was 
not a failure; hence it was detected by IMS but not by 
TEAMS. Furthermore, it was found that there are other 
differences between Shuttle and Ares I-X actuator tests 
due to the sequence being changed slightly along with 
a greater max displacement. Ostensibly, this had an 
even greater effect than the pinned/unpinned variation 
alone for IMS. 

5.2 Failures Detected Earlier by IMS Than by 
TEAMS 

Table 2 summarizes the detection times for the sim- 
ulated failures that were detected both by IMS and 
TEAMS in minutes after injection of the failure. A hy- 
pothesized advantage of IMS is that it may detect cer- 
tain failures before TEAMS. However, on average the 
results show that TEAMS detected failures prior to the 
time that IMS did. On two occasions, IMS was able to 
detect a simulated failure prior to TEAMS, as shown in 
red in Table 2. In the case of failure 3, which was simu- 
lated with a simple bit flip at the VAB, the detection oc- 
curred at approximately the same time. The other two 
failures are more complicated, and are described by 
gradual ramps of continuous-valued parameters rather 
than instantaneous bit flips of discrete-valued param- 
eters, owing to the notable differences in detection 
times. It can be seen from Table 2 that IMS sometimes 
detected failures earlier than TEAMS did, but more of- 
ten it detected them later. There may be some advan- 


Table 2: Summary of simulated failure detection times 


Failure 

Flight 

Trial 

IMS Detection Time 

TEAMS Detection Time 

Difference 

la 

STS-107 

1 

8.77 

2.24 

6.53 

2 

1931 

304 

174747 

3 

B74S 

09 

12709 

STS-112 

1 

217.37 

2.2 

215.14 

2 

337 

5704 

3777 

3 

222.07 

1.38 

220.7 

STS-120 

1 

039 

2723 

-04 

2 

8.69 

5.02 

3.67 

3 

3.45 

OS 

2707 

2 

STS-112 

1 

1.76 

1.5 

0.26 

2 

TUI 

2.33 

1.68 

3 

377S 

2.4 

OS 

STS-120 

1 

4792 

2.57 

2.35 

2 

332 

2732 

1.3 

3 

339 

2739 

1.5 

3 

STS-112 

1 

0 

0 

0 

2 

0 

0 

0 

3 

0 

0 

0 

STS-120 

I 

0 

0 

0 

2 

0 

0 

0 

3 

0 

0 

0 


tage to running IMS in parallel with TEAMS in order 
to provide earlier detection of some failures. Another 
observation worth noting is that there appears to be a 
wider variance for the IMS detection latencies for a 
given failure simulation spanning several flights. This 
lends credence to the fact that TEAMS detection times 
are based purely upon logic rather than statistics, the 
latter of which IMS incorporates in its detection capa- 
bility. 

5.3 Failures Detected By TEAMS That Were Not 
Detected By IMS 

IMS occasionally misses simulated failures, usually as 
a function of the fine tuning required to mitigate spe- 
cific instances of false alarms on test (Ares I-X) data. 
This fine tuning involves varying the number of clus- 
ters in the knowledgebase, the measurements (sensor 
values) represented in the knowledgebase, as well as 
the threshold or qualitative heuristic used following the 
application of ROC analysis. Typically, ROC curves 
span multiple failures, but are based only on a limited 
few Shuttle flights for training data. As such, when 
applying the resulting knowledge-base to unseen hold 
out test (e.g. Ares I-X) data, simulated failures may 
not be detected. In fact, great measures may need to 
be taken in order for such failures to be detected, of- 
ten at the expense of false alarms, as is apparent in the 
examples of false alarms presented previously. 

5.4 Failures More Appropriate For Modeling 
With TEAMS 

Anomaly detection methods such as IMS are not well 
suited for detecting some types of failures. As men- 
tioned previously, we used simulations of known fail- 
ure modes to test IMS. For some of these simulated 
failures, we expended a lot of effort in tuning IMS to 
get IMS to detect the simulated failures. This tuning 
process included reducing the set of measurements that 


were used to train IMS. For failure mode 4, a simulated 
failure covering a stuck actuator during a simulated 
positioning test at the VAB, almost all measurements 
other than the one required to simulate the failure had 
to be excluded in order to provide adequate detection 
capability. For this same case, a linear regression was 
required in order to facilitate the construction of com- 
manded position computed by proxy of a commanded 
current measurement due to the absence of the requi- 
site electromechanical conversion data. The difference 
between the quasi-commanded position and the actual 
measured position was then used as the sole parameter 
with which to train and test IMS. Any additional mea- 
surements included in the knowledgebase resulted in 
a missed detection. This is a case in which IMS was 
clearly not a good choice for detecting the particular 
failure mode. 

Cases such as these serve as evidence that each tool 
should be leveraged to promote its strengths rather 
than re-adapting the tool to solve a problem that is 
outside of its domain of relevance. With IMS, we 
know that its strengths lie in a great potential to detect 
faults that are unknown or that otherwise have not been 
modeled and to detect anomalies that are precursors of 
faults before a model-based system detects the fault. 
We believe that it would be better to rely on TEAMS to 
detect the known failure mode described above, rather 
than tuning IMS to detect it. Reducing the set of mea- 
surements that are used to train IMS did allow IMS 
to successfully detect the simulated failures, but it re- 
duced IMS’ potential to detect other unknown failures. 

6 SUMMARY AND CONCLUSIONS 

As mentioned previously, we believe including a semi- 
supervised data-driven anomaly detection algorithm 
such as IMS alongside a model-based diagnosis sys- 
tem such as TEAMS in a diagnostic system adds sig- 


nificant value, when used appropriately. Doing so will 
allow the overall anomaly detection system to be en- 
dowed with the potential to detect anomalies that can- 
not be detected by the model-based diagnosis system 
in isolation, either because they are unknown failures 
and therefore unmodeled, or because they are not fail- 
ures. Furthermore, IMS may detect known failures in 
advance of the time that TEAMS would detect them, 
and in general IMS requires less modeling effort than 
TEAMS (although it does require a sufficient quantity 
of historical and/or simulated training data). 

It was also important to consider the ability of IMS 
to detect failures in a true operational setting, but there 
was a dearth of true failures resembling those that we 
simulated with which to conduct experiments. There- 
fore, we hoped to demonstrate an improved ability of 
IMS to detect simulated failures that may be more re- 
alistic by increasing their fidelity. We have found that 
for fast FSM leaks, robust detection performance im- 
proves for both the high and low fidelity simulations, 
and the performance for both types of simulations also 
converges as the leak rate increases. However, over- 
all we have also observed that there is no appreciable 
difference between the effect of using a low or high 
fidelity simulation on detection performance. 
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